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In the Hopfield model the ability of the network to gen- 
^ ; eralization is studied in the case of the network trained by 
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one input image {the standard). 



^'- Basic Model 

o 

^ The maximization problem with the symmetrical connection matrix is con- 
sidered: 

F{a) = E JijCFiCFj -^ max, (Ti = {±1} \/i . 

Jij = Jji^ J a does not matter^ . 

a^ : The configuration vector a = (cri, (72, . . . , (Tn) providing the solution of the 
^ ; problem is called the ground state . 
I The Hebbian-like representation exists for every J, 

o'. J = S^ • S, where S is a real {p x n)-matrix and p =rank J. (2) 



We would like to investigate the special case of the {p x n)-matrix S: 

ll-x 1 ... 1 1 ... 1\ 
1 l-x ... 1 1 ... 1 



S= 



X is real 
p -\- q = n. 



(3) 



V 1 1 ... l-x 1 ... Ij 
The rows of the matrix S are the generalized memorized patterns. 



The meaningful interpretation of the problem: 



the network had to be learned by p-time showing of the standard e{n 

e» = (l,l,...,l)eR", 

but an error crept into the learning process and the network was learned 
by p distorted copies s^^^ of the standard: 

s<') = (l,...,l,]_-£,l,...,l), I = 1,2,..., p. (4) 

I 



The real number x is called the distortion parameter. 

The problem under investigation, Eqs.(l) -(3), is very close to the problem 

of generalization in the case of one embedded pattern^. 

Main Results^ 

1°. The local maxima of the functional F(a) necessarily have the form^ 

^* = (cri,cr2,...,crp, 1,...,1), (5) 

^ V ' 

a' 

and 

F{a*) oc x^ — 2x{q + p cos w) cos w + {q + p cos i/;)^, (6) 

where w is the angle between vectors a' and e{p) = (1,1,...,!) G R^: 

COS If; = = ,. ^ ,, — ,, ^, . ,, . (7) 

p II a^ II • II £[p) II 

Then, the vectors a* (5) with the ]9-dimensional parts a' equidistant from 
£{p), provide the same value of the functional F. 

2°. Evidently, 

2k , 

cosw; = coswk = 1 , /c = 0, 1, . . . ,_p, 

p 



and the vectors a* (5) are grouped into p -\- I classes T^k on which the 
functional F{a*) is constant: 



^k = {^* I exactly k coordinates of a* are equal to "-!"}. 



The number of the vectors a* in the class E^ is equal (^ 

3°. To find the ground state dependence on x, it is necessary to analyze 
the family of the straight lines 

Lk{x) = {q + p cos Wkf — 2x{q + p cos Wk) cos Wk, /c = 0, 1, . . . , J9. (8) 

In the region where Lk[x) majorizes all the other straight lines, the ground 



state belongs to the class T^k and it is (y times degenerated. 
4°. Theorem. 

When X increases from — oo to cx), the ground state in consecutive 
order belongs to the classes 

The /cth rebuilding of the ground state (L^-i -^ T^k) occurs at the point 
Xk of the intersection of the straight lines Lk-i{x) and Lk{x] 



•^k P , 0/07 1 \ ' ^•) ^1 • • • •) '^maxi \^) 

n + p — 2{2k — 1) 



where 

"n + _p+ 2' 

. 4 , 
The functional has no other local maxima. 



kmax = minlp, 



The Theorem relates the quality of the learning of the network with the 
value of the distortion x during the learning stage and with the length p 
of the learning sequence. It is reasonable, that the error of the network 
increases with the increase of the distortion x: when x G {xk, Xk+i) the 
class T^k ("the truth" understood by the network) differs from the standard 
€{n) by k coordinates (others interpretations see below). 
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In the Fig.l is given the typical behavior of the straight lines Lk{x). The 
rebuildings of the ground state occurs at the points Xk of the intersection 
of the straight hues Lk-i and Lk- Inside the interval {xk^ Xk+i) the ground 
state belongs to the class Tjj.. When x increases: 
a), all the rebuildings of the ground state occur: k^ax = P] 



b). only kmax = 



n+p+2 
4 



< p rebuildings of the ground state occur. 



Generalizations of Basic ModeB 

1°. When the standard £(n) is changed by an arbitrary configuration vec- 
tor a = (ai, 0^2, . . . , Q^n), <^i = {if} 7 the Theorem remains vahd, but the 
vectors a* (5) have the form 



a 



aiCTi, a2a2j . . . , a„crp, a^+i, . . . , a 



2°. When we rotate the memorized patterns (4) as a whole, all their first 

p coordinates are distorted. 

Suppose the rotation matrix U = (uij) transforms the first p coordinates 



of n-dimensional vectors only: 

= (7/1 7/0 7/1 

(10) 



ui = yF,=iUu, / = 1,2, ...,_p; II iT \\^=p. 



Then the memorized patterns take the form: 
S^"^ = (l^i - XUii,U2 - XU2U . . . ,Up- XUpi, 1, . . . , 1), / = 1,2, . . . ,p. 

It is easy to see, that if the standard £{n) does not change after the rotation 
[ui = 1) all the results of the "Basic Model" remain unchanged. 
More interesting is the case when the standard e{n) shifts after the rotation 
(ui 7^ 1). Again the only important configuration vectors are a* (5) and 
the functional F(a*) is given by the same expression (6). But now w is 
the angle between vectors a' and u: 



cosw = = ^ — ^ . (11) 

p II ^ II ■ II ^ II 

The vectors a* are grouped in the classes H^ on which F{a*) is constant: 



iu) 



^k — {^* I with /^-dimensional parts a' equidistant from u.} 



The number of the different classes S^ is given by the number t of dif- 
ferent values of cosw (11): 

cost(;o > cosw^i > . . . > cosw;^; cosw;^; = — cos Wt-k, V/c < t. 

Then we have the following generalization of the Theorem: 

when X increases from — cxd to cxd, the ground state in consecutive order 
belongs to the classes 

yiU) yiU) yiU) 
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The kth rebuilding of the ground state (^^_i -^ S^ ' ) occurs at the 
point Xk of the intersection of the straight lines Lk-i{x) and Lk{x] 



p 
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1+ ^ 



q + _p(cos Wk_i + cos Wk) 



Ki -L) ^) • • • ) l^max- V / 



If xi > ^p, all the rebuildings take place (kmax = t). If xi < ^p, the 
rebuildings stop when the denominator in Eq.(12) becomes negative. 

Note. The compositions of the classes Tj\, are determined by the values 

of {ui}i only. But the choice of {ui} is completely in our hands! Then we 

can create the Hopfield type network with a preassigned set of the fixed 

points. 

3°. The memorized patterns can be obtained from e{n) by the identical 

synchronous distortions of its m coordinates. Suppose n = pxm + q and 

the matrix S is 

{1-x ... 1-x 1 ... 1 ... 1 ... 1 1 ... 1\ 
1 ... 1 1-x ... 1-x ... 1 ... 1 1 ... 1 



S = 



1 



1 



1 



1 



1 — X 



1-x 1 



1 



The "suspicious" configuration vectors are the piecewise constant vectors 



a* = (cTi, . . . , CTi, Cr2, . . . , Cr2, . . . , CTp, . . . , CTp, 1, . . . , 1): 



13 



m m 

the values of the functional are 



m 



F(a*) (X x'^ — 2xcosw\ h_pcosw;) + ( ^pcosw] , 

where, as in Eq.(7), w is the angle between a' = (cri, (J2, . . . , G^g) and e(j) 

Again the vectors (13) are grouped into classes S^"^\ whose structure is 
similar to the structure of the classes Sy^. Then we have the generalization 
of the Theorem: the value of the parameter x, which corresponds to the 
kill rebuilding of the ground state (^^_i — ^ ^k )) ^-^ 



Xk=P 
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kmax — mm p 
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Basic Model: Sequences and Interpretations^ 



^ ^ n+p—2 — 2 



In fact Xi is the boundary of the distortions up to which the network 
reproduces the standard from its distorted copies correctly. The boundary 
depends on the length p of the learning sequence. Of course, the network 
is learned correctly, if the value of the distortions does not exceed |. 



Xi is monotonically increasing function of p and n. 



Let n and x he fixed. Merely due to an increase oip the boundary xi can 
be forced to exceed x (if x is not too large). As a result x turns out to 
be on the left of a new position of Xi, i.e. in the region where the only 
fixed point is the standard e{n). In other words, only by an increase of the 
length p of the learning sequence we can force the network to understand 
correctly "the truth" it is tried to be learned. It is in agreement with the 
practical experience: the greater the length of the learning sequence, the 
better the signal can be read through noise. 



Let p and x he fixed. As above, merely due to an increase of the number 
n the value of Xi can be forced to exceed x. This result is reasonable 
too: if p is fixed, the greater is n, the smaller is the relative weight - of 
the distorted coordinates. Naturally, the less is the relative distortion, the 
better must be the result of the learning. 



3°. Xp+i = p. (Without the loss of generahty we assume that p is odd.) 



Here we change the notation for the standard e{n) and introduce another 
standard e^~^: 

£<+' = ( 1, 1,..., 1,1, ...,!)=£» 
£-<-' = (-l, -1, ...,-1, 1, ...,1). 

In their not coincident parts the standards e^~^^ and e^~^ are opposed with 
each other, i.e. they are two opposite "statements". Any of the network 
fixed points a* (5) is an intermediate statement between e^^^ and £^~\ 
which is drawn towards either one edge of the scale, or the other. And the 
network "feels" this. 

Indeed, when the distortion x is not very large (x < p)^ the number k of 
the ground state does not exceed |, and the ground state more resembles 
^^' than ^'. In other words, the memorized patterns are interpreted by 
the network as the distorted copies of the standard £^+^ . But if during the 
learning stage the distortion exceeds p {x > p), the number of ground state 
exceeds | and the ground state resembles ^~\ Now the network interprets 
the memorized patterns as the distorted copies of another standard £^ ' . 

This is in agreement with the practical experience: we interpret devia- 
tions in the image of a standard as permissible only up to some boundary. 
If only this boundary is exceeded, the patterns are interpreted as the dis- 
tortions of quite different standard. For the network of the considered type 
this boundary is p. 

One extra argument to support this interpretation: from Eq.(9) it is easy 
to see that when p = const and n ^ oo all Xk stick to one point 

Xk=p, k = 1,2,3, ...,;?; 

then ioi X < p the ground state belongs to the class Hq = ^^\ whereas 
ioi X > p the ground state belongs to the class S^ = £^~). 



4°. 




P^ 

'n+p+2 

[ 4 J 


when l_\ < 1 
, when l_\ > i 



So, p is the boundary for the permissible distortions x. The question 
is, what do the memorized patterns with large distortions x mean? We 
treat the increase of x above p as the more and more negation of the 
standard t'^K As if the network is learned by the memorized patterns, 
which deny the standard £^~^K In other words, the network is relearned by 
presentation of negative examples. 

There is big and clear to everybody difference between the relearning with 
the help of negative examples and the learning of the opposite truth. The 
relearning is characterized by some specific difficulties: (1) the better the 
incorrect truth has been understood, the more difficult (and sometimes 
even impossible) to correct it; (2) it is comparatively easy to correct the 
result slightly, but it is much more difficult to revise it in the main, etc. 
We think, that the dependence of kmax on p is the reflection of just these 
problems. 

When the number p of the parameters which have to be corrected is not 
very great (^5i < \)-, the network can be relearned by simple presentation 
of negative examples. In this case kmax = P and, when "the denial" of 
the standard e^^^ is rather strong (x > Xp) , as " a new" truth the network 
understands the opposite standard e^~\ But if the number of the corrected 
parameters is great (^5y > |)) to relearn the network it is not sufficient to 

< kmax < P and whatever 



P+i 



2 

1 the network understands not 



present the negative examples. In this case 

large x is (xh^^^ < x < oo), as a new trut 

the opposite standard t^~\ but one of the statements intermediate between 

£^+^ and ^~\ Though the understood truth is drawn towards ^~\ since 

h > £ 

Of course, our interpretation is open for discussion. But it seems that in 
real life there are a lot of examples, which confirm our conception. 



New Results (in preparation 

The generalization of the Basic Model to the cases: 

1°. The functional F[a) in the problem (l)-(3) has the form 

n n 

i,j=l i=l 

In physics such a linear term describes the magnetic field. 

In the Fig. 2 the straight lines hk{x) divide the plane (x, h) into the regions 

where the ground state belongs to the different classes H^;, 

hk-i{x) = 2p{n - 2/c + 1) f — - 1 1 , /c = 1, 2, . . . 

\Xk J 
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ho(x) 



hi(x)/ 



^c 




X3 



2°. The distortions xi are different for all the memorized patterns (4): 



^0- 



l,...,l,l-x/, 1,...,1), / = 1,2, ...,:p. 



Suppose 



Xi > X2 >•••,> Xj) > 0. 



It can be shown that, firstly, only one of the configuration vectors 
a-ik) = (-1, -1, . . . , -1, 1, . . . , 1), fc = 0, 1, . . . , A;, 



^maxt 



k 

can be the ground state. Here 



) = Pi when p < ^ 

l^max 



< 



, when p > ^^ 

And secondly, in order to the vector ^*{k) be a ground state, 
the fulfilment of the inequalities 

k P k-1 p 

i=l j=k+2 -^ ^ ^ i=\ j=k+l ■' 

+ n-2k-l - ^ - " n-2k + l 

is necessary and sufficient conditions. 

When p is fixed and n ^ oo these inequalities are much more simpler: 

Xk+l <P<Xk. 

Note, that for sufficiently large n and p > Xi, the standard e{n) = a*(0) 
is the ground state. It seems, that for the Hebb connection matrix the 
last result clarifies the meaning of the well-known Latin saying "Repetitio 
est mater studiorum" - showing the same pattern many times (inevitably 
each time with a distortion), we seek to make the number of the presenta- 
tions p greater than the maximal distortion. 
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